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Abstract 

We study semiparametric time series models with innovations following a log-concave distribution. We 
propose a general maximum likelihood framework which allows us to estimate simultaneously the pa- 
rameters of the model and the density of the innovations. This framework can be easily adapted to 
many well-known models, including ARMA, GARCH and ARMA-GARCH. Furthermore, we show that 
the estimator under our new framework is consistent in both ARMA and ARMA-GARCH settings. We 
demonstrate its finite sample performance via a thorough simulation study and apply it to model a rabbit 
population data set. 

Key words: shape constraint, log-concavity, maximum likelihood, time series, ARMA, 
GARCH, ARMA-GARCH 

1 Introduction 



Statistical analysis of time series is an important issue in many areas of science. Many existing time series 
models postulate Gaussian innovations. Statistical inference is then typically based on the idea of maximum 
likelihood estimation. Some well-known examples include autoregressive moving average (ARMA) models 



Brockwell and Davis 



Bollerslev 



19911 ) and generalized autoregressive conditionally heteroscedastic (GARCH) models 



1986( 1. However, it is known that time series with non-Gau s sian i nnovations frequently occur in 



health, social and environmental sciences (cf. iDiggle. Liang and Zegerl (|2002r )). Often, the Gaussian quasi- 
maximum likelihood estimator (GQMLE) is used to alleviate t his is sue, and in most circumstances, the 
resulting estimates are still consistent (cf. iFranca and Zakoianl ([20041 )). Nevertheless, we argue that there 
are circumstances where semiparametric models are preferable, because estimating the distribution function 
of the innovations enhances our understanding of the data. For example, utilizing its quantile information 
can make the prediction more reliable. 



As an early attempt to model the innovation density nonparametrically, lEngle and Gonzalez-Rivera 



([19911 ) proposed a semiparametric autoregressive conditionally heteroscedastic (ARCH) model based on 
a nonparametric density est i matio n technique called discrete maximum penalized likelihood estimation. 
Drost. Klaassen and Werkerl (|1997l ) suggested an adaptive estimator (AE) for ARMA based on the kernel 
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density estimator. See lKreisd (|1987l ). ISun and Stengosl (|2006l) and lLing and McAleerl (|2003l ) for related work 
on other time series models. However, we argue that the above-mentioned estimators may potentially suffer 
from the following drawbacks: 

(a) they mainly focus on estimating the parametric part of the models; 

(b) their finite-sample performances depend heavily on the choice of tuning parameters, especially when the 
sample size is not too large. However, none of the above-cited work gives practical guidelines on how to 
set tuning parameters; 

(c) often some restrictive conditions are imposed, for instance, it is g enerally assum ed t hat the innovation 
distrib ution has a continuous density function. Furthermore, both iKreissi (|1987T ) and iLing and McAleer 
(2003) require the density function of the innovations to be symmetric. 



Motivated by recent developments in shape-constrained density estimation, in this paper we take a differ- 
ent approach by assuming that the innovations have a log-concave density (i.e. the logarithm of the density 
function is concave). The class of log-concave densities contains many commonly encountered parametric 
families of univariate distributions, including normal, gamma with shape parameter at least 1, Weibull dis- 
tributions with s hape parameter at least 1, bet a(a,/3) with a,/3>l, logistic, Laplace (double exponential) 
and Gumbel; see lBagnoli and Bergstroml (|2005l ) for more examples. Throughout this paper, we denote the 
class of log-concave densities by T . 

Our new modelling framework is as follows. Denote a class of separated scmiparametric time series models 
by (/, 6), where / is the density function of the independent and identically distributed (i.i.d.) innovations, 
and 6 is the parameter vector taking values in a parameter space O. Let l(f, 6) be its log- likelihood function. 
Denote the true density of the innovations and the true value of parameter vector by fo and 9q respectively. 
We propose to estimate fo and 8 by 



(/,0) G argmax l(f,0). 

We call (/,#) the log-concave maximum likelihood estimator (LCMLE). 
The advantages of our method include the followings: 

(a) it is free of tuning parameters; 

(b) it simultaneously estimates the density function of the innovations; 

(c) it is straightforward to implement; 

(d) it is easy to adapt to a wide class of time series models with only minor modifications; 

(e) it offers potential improvement over both the GQMLE and the AE in terms of finite sample performance. 

Furthermore, for certain classes of models, we argue that if fo is log-concave, then both / and 6 arc 
consistent. Even if fo is not log-concave (for instance, in the infinite variance ARMA), 9 can still be a 
consistent estimator of 9 . The flexibility and robustness of this semiparametric procedure make it highly 
desirable in practice. 

Here we list some applicable areas for our procedure. We argue that our approach gives an alternative 
to many of the statistical models listed below. 



(a) Streamflow and other hydrological data: Investigations (|Tao. Yevievich and Kottegodal . ll976l ) show that 
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the independent resi duals of autoregressive daily flow m odels have distributions whose tails are not heav- 
ier than exponential. 



Damslcth and El-Shaarawi 



([19891 ) studied ARMA models with Laplace innovations 
and used it to model the sulphate concentr ation in lakes in Ontario, Canada, 
(b) Animal populations: iLi and McLeodl (| 1988T) studied ARMA models with skewed innovations, and fitted 



an autoregressive model with gamma innovations to the Canadian lynx data. See also Section [4.41 for a 
real data example. 

(c) Financial data : The G ARCH model with Lapla ce innovations was shown to be s uperior to that with 

Gauss ian innovations bv lGranger and Dina (|1995[ ) for the S&P 500 index. Moreover. lHaas. Mittnik and Paolella 
(|2006l ) reported that the GARCH model with innovations being the convolution of Laplace and Gaus- 
sian (whi ch is log-concave) offers a plausibl e description of the daily stock return series in Germany. 
Recently, iTrindade. Zhu and Andrews! (|2010j) studied ARMA-GARCH models with asymmetric Laplace 
innovations and applied it to model real estate returns. 

The nonpa r ametric log-concave maximum like li hood density estimator was stu d ied in the i.i.d. setting by 



Walther (2002). 


Pal, Woodroofe and Mever 


(2007 


). 


Diimbgen and Rufibach (2009). 


Balabdaoui. Rufibach and Wellner 


(2009 


)• 


Cule. Samworth and Stewart 


(2010) 


. Cule and Samworth ( 


2010 


). 


Schuhmacher. Hiisler and Dumbgen 




(2011 


) and 


Dumbgen. Hiisler and Rufibach 


(2011 


) . These references contain characterizations of the estima- 





tor, as y mptotics and alg orith ms for its computation . Rega rding its applications, see lDumbgen. Samworth and Schuhmacher 
(|201ll ). iRufibachl (|2012i) and ISamworth and Yuan! (|2012l ). where it has been applied to the isotonic/linear 
regression, the receiver operating characteristic (ROC) curve estimation and independent component anal- 
ysis. Yet, to the best of our knowledge, none of the existing work concerns dependent data structures such 
as the stochastic proces ses studied in t his pa per. In fact, thi s paper gives very positive answers to the ques- 
tions ra ised recently bylXia and Tona (120101) a nd Yao (201 0]). For other po p ular s hape constraints, one may 
refer to 



(|2010|) 



Groeneboom. Jongbloed and Wellnerl (|2001l ). ISeregin and Wellnerl (|2010j) and 



Koenker and Mizera 



The rest of the paper is organized as follows. In Section [21 we apply our method to the class of ARMA 
models. We display in detail how the LCMLE is constructed in Section 12.11 Theoretical results regarding 
its existence and consistency are given in Section |2~2"1 A variant of the LCMLE is suggested in Section [2~B1 
which offers further potential improvement in small sample sizes and provide s a nic e link to the smoothed log- 



Chen and Samworth 



concav e maximum likelihood estimator studied by iDiimbgen and Rufibach! (2009) and 
(|2012l) . 

Section [3] adapts the framework to a particular nonlinear setting, where ARMA-GARCH models are 
considered. The challenge of constructing the LCMLE is taken up in Section l3~T| while results concerning 
its existence and consistency are described in Section 13.21 It is worth noting that in Sections 12.21 and 13. 2\ 
our theory is developed under both correct and incorrect model specification of the innovation distribution. 

Section 14.11 is devoted to the computation of the LCMLE. Simulation studies follow in Section 14.21 and 
4.31 confirming the improved finite sample performance over the GQMLE and the AE in the setting of non- 
Gaussian innovations. Moreover, we demonstrate that even in the case where the innovations are Gaussian, 
the performance of our LCMLE remains comparable to that of its competitors. Finally, Section 14.41 gives 
an application of our methodology to the Yorkshire rabbit (Oryctolagus cuniculus) population data set. We 
defer all proofs to the appendix. 
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2 ARMA models 



In this section, we consider the ARMA(p, q) process with observations {A t }. The model is defined as 

p q 
Xt = / J diXt-i + he t -i + e t , 



where {et} are i.i.d. random variables, and where di, . . . , a p , b\, . . . , b q are real coefficients. 

Arguably, ARMA mod els are the most popular linear models used by time series practitioners. See 
Brockwell and Davisl (| 1 9 9 lh for a thorough survey of the background. Our goal in this section is to estimate 
the parameters a\, ... , a p , bi,...,b q and the distribution of {et} simultaneously. 



2.1 The log-concave maximum likelihood estimator 

Assume that the observations X\, . . . , X n are from an ARMA(p, q) process, where the orders p and q are 
known. The vector of the parameters 

6 = (a T ,b T ) T = (en,..., Op, &i,..., b q ) T 

belongs to a parameter space C ]R p + 9 . 

Let 0q = (oq, bp ) T = ( a oij • ■ • 7 aopj &oi> ■ • ■ > boq) T and Qo denote respectively the true value of the 
parameter vector and the true distribution of the innovations. 

Let $ be the family of concave functions <f> : K. — > [-co, oo) which are upper semicontinuous and coercive 
in the sense that </>(x) — > —oo as \x\ — > oo. Furthermore, denote the set of concave log-densities by 

The following conditions are imposed to construct the LCMLE: 

(A.l) Qo is a distribution with density function /o and has finite expectation; 
(A. 2) 0q G 6, where 9 is closed; 
(A. 3) 6 is a bounded subset of RP+"?. 

The log-concave log-likelihood can be expressed as 

n 

6) = l n (4>, 0;X u ...,X n ) = J2 <f>(e t (0)), 

t=i 

where 4> € $o> ^ G 6 and {e t (6)} are the estimated innovations computed recursively by 

p <? 
e t {6) =X t -^2 o-iXt-i - ^ biet-i(Q), for * = 1, ... , n. 

i=l i=l 

The choice of the unknown initial values Xo, ■ ■ ■ , Ai_ p , eo{0) 1 ■ ■ ■ , ei- g (#) can be shown to be unimportant 
asymptotically (see Appendix). For simplicity, these initial values are taken to be fixed (i.e. neither random 
nor functions of the parameters). 
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Intuitively, one would seek to maximize l n (<j>,8) over <l>o x ©• However, it turns out that this naive 
optimization appr oach is very com putationally intensive. We therefore employ the standard trick of adding 



a Lagrange term (|Silvermanl . 119821 ) and propose the following procedure: 

(i) Let (4>m 8 n ) be a maximizer of 

1 " r 

A„(0, 8) = A„(0, 0;X 1 ,...,X n ) = -Y j </>(e t (6)) - / e^ x) dx + 1 (2.1) 

over all (<fi,9) e <f> x 6. 

(ii) Return 

f n (x)=e^ and 8 n , (2.2) 

where we call /„ and 8 n the log-concave ARMA maximum likelihood estimator of /o and 8q respectively. 

Remark: One can think of — J e^ x 'dx + 1 in (|2.1|) as a 'Lagrangian' term. For any fixed 8, the maximizer 
4>e = argmax^g^, A n (<f>, 8) automatically satisfies J e^ e ^dx — 1, so e^ n ^ x ' always defines a density. 

2.2 Theoretical properties 

Theorem 2.1 (Existence in ARMA). For every n > p + q + 1, under assumptions (A.l) - (A. 3), the 
LC'MLE (f n ,8 n ) defined in 12. 2\) exists with probability one. 

In the case q = (autoregressive models), assumption (A. 3) is not needed to guarantee the existence of 
the LCMLE. In particular, as is justified by the following corollary, one can just take = M p . 

Corollary 2.2. If q = 0, then for every n > p+1, under assumptions (A.l) - (A. 2), the LCMIE (f n ,8 n ) 
defined in \2. 2]) exists with probability one. 



Define the ARMA polynomials as follows: 

p Q 



A 9 {z) = 1-J2 a ^ 1 and B e (z) = l+J2 b i zi - ( 2 - 3 ) 



i=l 



To establish the consistency of the LCMLE, we impose two more assumptions: 
(A.4) For all 8 E 6, A e {z)B {z) ^ for all z G C such that \z\ < 1; 

(A. 5) If p > and q > 0, Aq (z) and Bg g (z) have no common roots and |ao p | + \bo q \ ^ 0. 
Remarks: 



1. Under assumption (A.4), it can be shown in the spirit of Proposition 13.3.2 of 



Brockwell and Davis 



(|199ll ) that observations {X t } are drawn from a strictly stationary and ergodic process. It also restricts 
our attention to causal and invertible ARMA processes. 
2. The ARMA models without assumption (A. 5) are not identifiable. Assumption (A. 5) also allows for an 
overidentification of either p or q, but not both. 
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Define the best log-concave approximation of Qq as 



fa = arg max / log / dQ . 



where T is the class of log-concave densities. If Qo has a log-concave density function /o, then /q = /q. 
Otherwise, in the case that /o has finite entropy, /g is the density function that minimizes the Kullback- 
Leibler divergence DxL(fo,f) — / /o l°g(/o//) over all / £ J. Therefore, if fo is not too far away from 
log-concave, fn will be rea sona bly close to /p. More detail s regarding the properties of /g can be found in 
Cule and Samworthl (|2010j ) and IChen and Samworthl (120121 ) . 
Now we are in the position to state the consistency theorem. 



Theorem 2.3 (Consistency in ARMA). Let (f n ,0 r , 
assumptions (A.1)~(A.5), almost surely 



be a sequence of LCMLEs defined in V2.2\) . Under 



fn(x)~fo{x)\dx-^0 and 6 r 



70- 



(2.4) 



When q — 0, there is no need to estimate the innovations iteratively, so assumptions can be relaxed to 
derive a consistent LCMLE. 



Corollary 2.4. Let (f n ,6 n ) be a sequence of LCMLEs defined in 
(A.l), (A. 2) and (A. 4), holds almost surely. 

Remarks: 



If q = 0, then under assumptions 



1. As can be seen from the proofs in the appendix, it is possible to drop the first part of condition (A.l) 
(i.e. Qo has a density function), and replace it by the following slightly weaker condition: 

(A.l*) Qq is not a point mass and has finite first moment. 

But then the LCMLE exists only with asymptotic probability one. See also the numerical experiments in 
Section [4.31 for more evidence. 

2. The convergence of f n {x) in the L\ norm can be strengthened as follows: suppose that a : R — > R is a 
sublinear function, i.e. a(x + y) < a(x) + a(y) and airx) — ra{x) for all x, y € R and r > 0, satisfying 
e a ^/o (x) — > as |x| — > oo. Then it can be shown that under the conditions of Theorem 12.31 



e* x) \fn(x)-fS(x)\ a -*0 



([Schuhmacher. Hiisler and Dumbgenl . 



2011 



Theorem 2.1). 

3. Unlike the common approaches in the literature, we do not require the variance of Qo to be finite in 
order to establish th e consistency of 6 n for the L CMLE. For other estimator that can handle the infinite 



variance ARMA, see 



Pan. Wang and Yaol |2007l ) 



2.3 The smoothed log-concave maximum likelihood estimator 

In this subsection, we describe a variant of the LCMLE. It has some superior properties over the LCMLE 
defined in (|2.2[) . is easy to implement, and yet remains computationally feasible. 



G 



One problem associated with the LCMLE is that the estimated density function f n is not everywhere 
differentiable on the real line. It is not even continuous on the boundary of its support. In fact, non- 
smoothness is a characteristic feature of shape-constrained maximum likelihood estimators. 

To build an es timator with more attra c tive y isual appearance, and to offer potential improvement in 
small sample sizes, iDiimbgen and Rufibachl (|2009f) introduced a smoothed (yet still fully automatic) version 
of th e univariate log-conc a ve ma ximum likelihood density estimator via convolving with a Gaussian den- 
sity. IChen and Samworthl ([20121 ) extended this idea to the multivariate setting and studied its theoretical 
properties. 

In the case that Qq has finite variance, we can adapt this general idea by modifying Step (ii) of the 
ARMA estimation procedure as follows: 



(ii) Define the empirical distribution 



1 " 



e t (0„) ' 



t=l 



where S a denotes a Dirac point mass at a. Let /„ = /„ * <pj^ with 



An 



X d Qn,6„( X )~ \ X fn(x)dx, 



where is the convolution operator and 4>a is the univariate normal density with mean zero and 
variance A. Return f n and the same 9 n . We call (f n ,8 n ) the smoothed log-concave ARMA maximum 
likelihood estimator or simply the smoothed LCMLE. 

It can be shown that A n is always positive, so /„ is well-defined. We note that the value of 6 n remains 
unchanged, but now /„ is replaced by its slightly smoothed version f n . All the theoretical results described 
in Section l2~2l are still valid. But instead of converging to /q in Theorem 12.31 and Corollary [23J f n converges 
to /„**, i.e. f\L(x) - ft* (x)\dx °4- 0, where /** = ft * 4>A- with A* = J x 2 f a {x)dx -Jx 2 f*(x)dx (cf. 
Chen and Samworthl (|2012l )). Nevertheless, in the case that /o is log-concave, /q* = /q = /o- 



3 ARMA-GARCH models 



The class of ARCH models was developed bv lEngld <| 1982T ) and generalized bv lBollerslevl ([1986). It is common 
in practice to fit ARMA models with GARCH errors , which can be viewed as an extension of both ARMA 
and GARCH models. See 



Franca and Zakoianl (|2010l ) for a nice introduction 



We write the ARMA(p, g)-GARCH(r, s) model as 



p 



X t = ^2 a i x t-% + ^2 h%-i + Vu 



Vt = o- t e t , 



a t = c- 



E 

2=1 



2 

t — i i 
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where innovations {et} are i.i.d. random variables with unit second moment (i.e. Eef = 1). Here c > 0, 
a.i > for i = 1, . . . , r and f3i > for i = 1, . . . , s. 

An primary feature of this class of models is that it allows the conditional variance of the errors to 
change over time. Often the distribution of {et} is assumed to be standard normal, so that estimates of 
the all parameters can be derived by maximizing the conditional log-likelihood. If the distribution of {e t } 
is mis specified, maximizing the G aussian quasi-log-likelihood still gives consistent estimates of these param- 
eters (jFranca and Zako'ianl |2004[) . but is occasionally inefficient. Non-Gaussian quasi-maximum likelihood 
estimators also exist in the lit erature, but they may l ead to inconsistent estimates if the distribution of 
the innovation is misspecified (jNewev and Steigerwaldl 119971 ). In the following, we tackle the problem by 
assuming that the innovations {et} have a log-concave density. 



3.1 The log-concave maximum likelihood estimator 

Suppose that the observations X\, . . . , X n constitute a realization of an ARMA(p, g)-GARCH(r, s) process, 
where the orders p, q, r and s are assumed to be known. The vector of the parameters 

= (a T , b T ,c,a T ,f3 T ) T = (at, ...,a p ,h,.. .,b q ,c,ai, . . . , a r , fa, . . . ,(3 S ) T 

belongs to a parameter space of form O C M. p+q x (0, oo) x [0, oo) r+s . 

Both the true distribution of {et} and the true value of the parameter vector are unknown and to be 
estimated. They are denoted respectively by Qo and 

#o = i a o y tfi , Co, a.Q ,Pq) t = (aoi, ■ ■ ■ , a 0p , 6 i, ■ • ■ , &ogi c 0i a oii ■ ■ ■ ■> a 0n An j ■ • ■ j flos) T ■ 

In order to construct the LCMLE, we impose the following conditions: 

(B.l) Qo has unit second moment and a density function /o; 
(B.2) 0q £ & and 9 is compact; 

Remark: Without loss of generality, we can assume in the rest of the paper that (B.2) holds true when 
the parameter space is of form 

6 = [-1/(5, \/5] p+q x [5,1/5] x [0, l/5] r+s C rp+<3+''+s+ 1 

for some known sufficiently small 5 €E (0, 1). 

Now the log-concave log-likelihood of ARMA-GARCH can be expressed as 

8) = I„(0, 0; X X , . . . , X n ) = J2 <t> [ J A ) \ E l0 § > 

t =i Vv^tw/ z t= i 



where <f> e $o, £ {f) t {9)} and {of (#)} arc defined recursively by 

p 9 

i=l i=l 

If r > g, the required initial values are Xq, . . . , Xi_r r _ q \_ p , fj q _ r (0), . . . , fji- r (0), 0q(6), . . . ,af_ s (8); other- 
wise, they are A , ■ . ■ , X 1 _r r _ q \ p , rjo(0), . ■ . , f]i- q (0),a 2 (6), . . . , &\_ s (0). As is shown in the appendix, the 
choice of these unknown initial values is asymptotically irrelevant to our final estimates. To simplify the 
analysis, we take them to be fixed. 
Let $i be a subset of $ such that 



$1 = j(/> e $ : J e+^dx = 1, J x 2 e^dx = 1 j . 



Naturally, one would attempt to maximize l n {(j), 0) over $i x 9. However, it is hard enforce all the constraints 
simultaneously. Therefore we seek to reformulate the optimization problem. 

Our approach is motivated by the following idcntifiability property of the ARMA-GARCH process: if we 
replace (/(•), a, ft, c, a, /3) by a, b,kc,ka, (3) for any constant k G (0, oo), the ARMA-GARCH 

process remains unchanged. Therefore we can enforce the constant c to be one in Step (i) of the following 
procedure and transform it back in Step (hi): 

(i) Define the transformed parameter space 

9' = [-1/(5, l/S] p+q x {1} x [0, l/6 2 ] r x [0, 1/5} S . 
Let (<f)' n , On, b n , 1, a n , (3 n ) be a maximizer over (<f), 6) e $ x 6' of 

K{<j>, 6) = k n {4>, 6- Ar, . . . , X n ) = l£> ( ^== ) - i E M^W) - / e^dx + 1. (3.2) 

n t=l W^tW/ Z " t=l J 

For convenience, we denote (a^, b n , 1, (6t' n ) T , j3 n ) T by 0„. 

(ii) Set 

* _ 1 y vHO n ) 

(iii) Return 

/n(^) - v^e*»<^ and 0„ = (<£, c„, c„(a^) T , ^) T , (3.3) 

where (/„,#„) is called the log-concave ARMA-GARCH maximum likelihood estimator of (/o,#o)- 
Remarks: 

1. The function /„ is always a probability density function. Though it is not guaranteed that J x 2 f n (x)dx = 
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1, we show in Section |3~21 that this statement is asymptotically true if fo is log-concave. 

2. By making use of the smoothed log-concave density estimator, it is easy to modify the above steps to 
enforce the second moment of the estimated innovation distribution to be exactly one. See Section [lO] for 
more details. 

3. By setting p = q = 0, the above procedure can be used for pure GARCH processes. 
3.2 Theoretical properties 

The existence of the LCMLE under the ARMA-GARCH setting is stated in the next theorem. 

Theorem 3.1 (Existence in ARMA-GARCH). For every n>p + q + r + s + l, under assumptions (B.l) 
- (B.2), the LCMLE (f n ,9 n ) defined in h3.3\) exists with probability one. 

In addition to the ARMA polynomials mentioned in Section [3J we define the GARCH polynomials as 

r s 

Ao{z) = 2_j aiZ% an< ^ ffe(z) = 1 — }^/3jZ l . 

i=l i=l 

To show strong consistency, several mild assumptions are needed: 
(B.3) For all 9e 9, ££=iA < L 

(B.4) The GARCH(r, s) process with the innovation distribution Qo and the parameter vector (co, at^ ,(3q) t 

is strictly stationary and ergodic; 
(B.5) If s > 0, Ag (z) and Be (z) have no common roots, Ag (l) ^ and ao r + (3q s ^ 0. 

Remarks: 

1. It can be shown that the assumption (B.3) is wea ker than assuming strict st ationarity of the GARCH 



processes over 9. For instance, see Corollary 2.2 of lFrancq and Zakoi'anl (|2010l ). 



2. A nec essary and sufficient condition for the assumption (B.4) was established by iBougerol and Picard 



(1992) in t erms of the top Lyapunov exponent. A more interpretable sufficient condition was given by 



Bollerslevl (|1986l ). namely, X^=i a 0i + J2i=i ftoi < 1- Note that Bollerslev's condition also implies second- 
order stationarity of GARCH, but here we do not need such a strong condition to establish the consistency 
of our LCMLE. 

3. Assumption (B.5) ensures that the GARCH part of the model is iden tifiable. This assumption also 



allows for an overidentification of either r oi s. We refer to Remark 2.4 of 
a detailed discussion. 



Franca and Zakoi'anl (|2004l ) for 



Theorem 3.2 (Consistency in ARMA-GARCH). Let (/„,#„) be a sequence of LCMLEs given by VS. 3)) . 
Under assumptions (B.1)-(B.5) and (A.4)~(A.5), almost surely 



\fn(x) - fo(x)\ dx and 6 n -> 6 , 
as n — > co. Moreover, if fo is log-concave, then 

x 2 f n {x)dx — > 1, a.s. (3-4) 
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Remarks: 



1. In the above theorem, (B.l) can be replaced by the following weaker condition: 

(B.l*) Qq is nondegenerate, has unit second moment and is supported at more than two points. 

But then one can show that the LCMLE exis ts only with asymptotic probability one. 

2. It was shown by iFrancq and Zakoianl (|2004l) that the GQMLE for ARMA-GARCH is inconsistent if 
Ee t 0. However, this condition is not required here to ensure the consistency of our LCMLE. Still, the 
asymptotic distribution of our LCMLE remains to be investigated further. 



3.3 The smoothed log-concave maximum likelihood estimator 

Analogous to Section l2~3l the idea of smoothing can be adapted to Step (hi) of the ARMA-GARCH estimation 
procedure by changing it as follows: 

(iii) Compute (/ n ,0 n ) in the same way as before. Set A n = 1 — J x 2 f n (x)dx and /„ = /„ * <j>^ (N.B. 
one can prove A n > 0). Return /„ and the same 9 n . We call the smoothed log-concave 

ARMA-GARCH maximum likelihood estimator. 

One nice feature of this new estimator is that the unit second moment constraint is always satisfied, i.e. 
/ x 2 f n {x)dx = 1. Again, Theorem 13. II and Theorem 13.21 are still valid, but /„ converges to /q* instead of /g 
in Theorem 13.21 



4 Computational issues and simulation results 



4.1 Computational issues 

Computing the LCMLEs proposed in Section [5] and Section [3] is fast and straightforward, especially when 
the orders of the processes are not too high. To see this, we note that the parametric part of the LCMLEs 
can be expressed as 

or 



9„ e argmaxY n (0) 

eee 



n e argmaxT„(0) 

eee' 



with T n (9) = sup^gcj, A n ((f>, 9). It is shown in the appendix that T n (0) is a continuous function. Therefore, 
the optimization problem can be divided into two parts: 

1. for a given fixed 6, hnd G $ that maximizes A„(</>, 9); 

2. for a given continuous function T n (9) on a finite-dimensional compact set (i.e. or 0'), find its 
maximizer. 

The first part is a convex optimization problem, where the optim al 4> € can be found very qu ickly by 
an active set algorithm implemented in the R package logcondens ( Dum bgen and Rufibachl . 120111 ) . More 
details on its implementation can be found in lDumbgen. Hiisler and Rufibachl (|201l[ ). 

The second part is a continuous function optimization proble m. Many well-known o ptimization algo- 
rithms ca n be utilized, including the downh ill simplex algorithm (jNelder and Meadl . 



evolution ([Price. Storn and Lampinen 



_op 
35) 



1965!) and differential 



When initial guesses are needed for 9, one reasonable choice 



would be the least squares (LS) estimate of 9q. 
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In the following studies, we used the downhill simplex algorithm for optimization, because it suffices for 
our purpose and is typically much faster than differential evolution. 



4.2 Finite sample performance I: comparing with the GQMLE 

To examine the finite sample performance of our method (in estimating the parametric part of the model), 
we run simulation experiments on a variety of ARMA, GARCH and ARMA-GARCH models. Both the 
centered exponential innovations (i.e. fo(x) = e x+1 , x > —1) and the standard Gaussian innovations (i.e. 
fo(x) = ~^^ e ~ x i £ K) are considered. We set the number of observations n = 1000. Models that we 
consider, together with their corresponding true values of parameters are listed in Table [TJ These values are 
picked in such a way that all assumptions from Section [5] and [3] are satisfied. 



Linear models 


MA(1): 




= 0.5 








AR(2): 


aoi 


= 0.5, a 02 


= -0.5 






ARMA(1,1): 


a i 


= 0.5,&oi 


= 0.5 






ARMA(3,2): 


aoi 


= 0.75, a 02 = -0.5, 


a 03 = 0.25, 6oi = 0.75,6o2 = 0.25 


Nonlinear models 


ARCH(l): 


co = 


= 2, a i = 


0.5 






ARCH(2): 


co = 


= 1, a i = 


0.5, ao2 = 


0.5 




GARCH(1,1): 


co = 


= 2, a i = 


0.5, An = 


0.5 




GARCH(3,2): 


co = 


= 0.5, a i 


= 0.3, a 2 


= 0.1,a 3 = 0.2,/3oi 


= 0.2,/3 O2 = 0.1 


ARMA(1,1)-GARCH(1,1): 


aoi 


= 0.5,6 01 


= 0.5, c = 


= 0.5, a 01 = 0.5,^oi 


= 0.5 


Table 1: Different 


time 


series models considered in the simulation 


study. 



The results obtained in 1000 simulations by the LCMLE are given in Table [5] in terms of the estimated 
root-mean-square error (RMSE). Here RMSE is defined as \J~E\\0 n — 0oii|i where || • H2 is the Euclidean 
norm. The estimates from the GQMLE are illustrated for comparison. The least squares estimator (LS) is 
omitted here because its performance is no better than that of the GQMLE. 



Models 


Estimated RMSE 




centered exponential 
LCMLE GQMLE 


Gaussian 
LCMLE GQMLE 


MA(1) 
AR(2) 
ARMA(1,1) 
ARMA(3,2) 
ARCH(l) 
ARCH(2) 
GARCH(1,1) 
GARCH(2,3) 
ARMA(1,1)-GARCH(1,1) 


0.0026 0.0282 
0.0034 0.0392 
0.0056 0.0497 
0.1019 0.2298 
0.1807 0.3155 
0.1151 0.2866 
0.1882 0.7686 
0.1044 0.3446 
0.0700 0.2588 


0.0287 0.0271 
0.0423 0.0395 
0.0521 0.0485 
0.2519 0.2399 
0.1686 0.1510 
0.1656 0.1500 
0.4727 0.4423 
0.2254 0.2217 
0.1599 0.1478 



Table 2: Estimated root-mean-squared error (RMSE) of the LCMLE and the GQMLE in different models with 
different types of innovations. 

These results suggest that if the true innovations are non-Gaussian but log-concave, the LCMLE offers 
substantial improvement over the GQMLE. Strikingly, the reduction in RMSE varies from 50% to 90% in 
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the case where the innovations follow the centered exponential distribution. Even if the true distribution of 
the innovations is Gaussian, our LCMLE's performance is still comparable to the GQMLE's, indicating that 
there is little price one has to pay for only assuming the innovations to be log-concave, rather than Gaussian. 



4.3 Finite sample performance II: comparing with the AE 

In this subsection, we run a small numeri cal study to compare the perf o rman ce of our LCMLE with that of 
the adaptive estimator (AE) proposed bv lDrost. Klaassen and Werkerl (|1997l ) in estimating the parametric 
part of the model. For simplicity, we consider the AR(1) model with the true parameter aoi = 0.5. Different 
types of innovations considered together with their features are listed in Table |3j 



Type of the innovations 


log-concave 


Features 
symmetric 


discrete 
component (s) 


Centered exponential 


/ 


X 


X 


Laplace (double exponential) 


/ 


/ 


X 


Standard Gaussian N(0, 1) 


/ 


/ 


X 


Student's tg 


X 


/ 


X 


Mixture of Gaussian & a point mass ^N(0, 1) + i5o 


X 


/ 


/ 


Centered Binomial 5(2,0.5) - 1 


X 


/ 


/ 



Table 3: Different types of innovations considered and summary of their features. 



For the purpose of comparison, we scaled all the variances to one. We consider the two sample sizes 
n = 100 and n — 1000. To implement the AE, we used the GQMLE as an initi al estimator, tog ether with 
the kernel density estimator with the logistic kernel and the bandwidth given in Isilverman ( 1986 . page 47). 

The results obtained in 1000 simulations are given in Table Q] and Table [5] in terms of the empirical bias 
and the estimated standard deviation (SD). Surprisingly, the LCMLE performs substantially better than the 
AE when the innovations have a log-concave but non-Gaussian density, even though the AE is efficient in 
the asymptotic sense. We believe that this is largely due to the difficulty in choosing the tuning parameters 
for the AE in finite samples. The effect is exaggerated if /o is non-symmetrical or has bounded support. 
It is also interesting to note the robustness of the LCMLE to misspecification of log-concavity, as it still 
outperforms the AE for t% innovations at reasonably large n. The most striking improvement occurs when 
the distribution function of the innovations is not absolutely continuous. This is due to the fact that the 
AE requires the existence of a density, which is not satisfied in the last two cases. Consequently, the AE 
performs poorly. On the other hand, the GQMLE performs fairly well in these settings, but our LCMLE 
looks even better although the log-concavity assumption is violated. 

This conclusion is reconfirmed in Figure Q] and Figure [2j where boxplots of the absolute errors for different 
estimators of aoi based on n = 100, 1000 observations in the above setting are given. Finally, we remark 
that similar boxplots can be obtained under the setting of other ARMA/ARMA-GARCH models. 



4.4 Real data example 



Here we illustrate our methodology on the rabbit population data set of iMiddletonl (jl934[ ). freely available 
at http://www.sw.ic.ac.uk/cpb/cpb/gpdd.html. The numbers of rabbits killed yearly on a large estate in 
Yorkshire, England from 1867 to 1928 were recorded in this data set. Data were log-transformed and 



13 





Exponential 
Bias SD 


Laplace 
Bias SD 


Gaussian 
Bias SD 


LCMLE 

AE 
GQMLE 


0.0028 0.0204 
0.0070 0.0704 

n r> i nr\ n Aon 

-0.0190 0.0851 


-0.0140 0.0790 
-0.0123 0.0899 
-0.0208 0.0899 


-0.0210 0.1052 
-0.0191 0.1000 

a r\o/Ti a noon 

-0.0269 0.0889 




*3 

Bias SD 


Mixture 
Bias SD 


Binomial 
Bias SD 


LCMLE 

AE 
GQMLE 


-0.0161 0.0830 
-0.0167 0.0922 
-0.0258 0.0889 


-2 x 10" b 2 x 10~ b 
-0.0016 0.0592 
-0.0247 0.0825 


-1 x 10- b 1 x 10~ b 
-0.0937 0.6609 
-0.0258 0.0923 



Table 4: The empirical bias and SD of the LCMLE, the AE and the GQMLE in AR(1) with n = 100 observations. 





Exponential 
Bias SD 


Laplace 
Bias SD 


Gaussian 
Bias SD 


LCMLE 

AE 
GQMLE 


0.0004 0.0020 
0.0087 0.0204 
-0.0024 0.0266 


-0.0012 0.0209 
-0.0009 0.0251 
-0.0017 0.0276 


-0.0031 0.0307 
-0.0018 0.0300 
-0.0034 0.0280 




*3 

Bias SD 


Mixture 
Bias SD 


Binomial 
Bias SD 


LCMLE 

AE 
GQMLE 


0.0004 0.0209 
-0.0002 0.0246 
-0.0017 0.0276 


-2 x 10~ 7 1 x 10~ 5 
0.0025 0.0466 
-0.0012 0.0269 


-1 x lO" 6 1 x 10~ 5 
-0.1553 0.7116 
-0.0017 0.0282 



Table 5: The empirical bias and SD of the LCMLE, the AE and the GQMLE in AR(1) with n = 1000 observations. 



centered. This transformation is commonly used in population ecology thanks to the multiplicative nature 
of the population dynamics processes involving birth and death. Figure O^a) shows the transformed series. 
Its partial autocorrelation function (PACF) is plotted in Figure [31(b) . Note that the PACF is still a useful 
to ol to help identify the appro priate order of AR(p) processes even if Qq is non-Gaussian (see Theorem 8.1.2 
of iBrockwell and Davisl (|199ll) '). The PACF plot hints that we could summarize the series by a first-order 



autoregressive (AR(1)) model 

X t = aXt-% + € t , 

where {et} are i.i.d. innovations following an unknown distribution Qq. 

It can be shown that it is inadequate to summarize this series using AR(1) with Gaussian innovations. In 
fact, a Shapiro-Wilk test on the residuals gives strong evidence against the normality assumption (p-value 
= 0.0015). One alternative is to refit the model with innovations of other parametric forms, but one still has 
to choose the parametric family of the innovations beforehand. Here our approach offers a new possibility. 
By adapting the autoregressive models into our framework, we have fitted the AR(1) with cilcmle = 0.5635. 
The estimated density functions corresponding to both unsmoothed and smoothed LCMLE are plotted in 
Figure [3fc). A quantile-quantile (Q-Q) plot of the residuals (obtained from LCMLE) against the quantiles 
of the fitted unsmoothed LCMLE is illustrated in Figure Eld) , indicating that the log-concavity assumption 
of Q seems to be adequate here. The corresponding Q-Q plot against the fitted smoothed LCMLE appears 
to be similar, so is omitted for brevity. 
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Exponential 



Laplace 



(a) 

Gaussian 



X 



AE 



(b) 

Student's t 



(c) 

Mixture 



(d) 



Binomial 



1 



(e) 



Figure 1: Boxplots of the absolute errors for different estimators of aoi based on n = 100 observations in the setting 
of AR(1) (aoi = 0.5) with different types of innovations: (a) centered exponential; (b) Laplace; (c) Gaussian; (d) 
student's £3; (e) mixture of Gaussian and a point mass; (f) centered binomial. 
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Exponential 



Laplace 



AE 

(a) 

Gaussian 



AE 

(b) 

Student's t 



AE 

(c) 

Mixture 



AE 

(d) 

Binomial 





(B) 



Figure 2: Boxplots of the absolute errors different estimators of 001 based on n = 1000 observations in the setting 
of AR(1) (aoi = 0.5) with different types of innovations: (a) centered exponential; (b) Laplace; (c) Gaussian; (d) 
student's £3; (e) mixture of Gaussian and a point mass; (f) centered binomial. 
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— i 1 1 1 1 1 1 1 1 1 1 1 r~ 

-1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 -1.5 -1.0 -0.5 0.0 0.5 1.0 

theoretical quantiles (LCMLE} 



(c) (d) 

Figure 3: (a) plots the log-transformed and centered time series based on the rabbit population data set; (b) plots 
the PACF; (c) plots the estimated density functions by the LCMLE (solid) and the smoothed LCMLE (dotted); (d) 
gives the Q-Q plot of the residuals against the quantiles of the fitted unsmoothed LCMLE. 
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5 Appendix 



5.1 Preliminaries 

We first introduce the p th Mallows distance and the Levy-Prokhorov distance as useful measures of distances 
between two probability distributions. The p th Mallows distance is also known as the p th Wasserstein 
distance. For historical reasons, when p = 1, it is also called the Kantorovich-Rubinstein distance or the 
Earth Mover's distance. The Levy-Prokhorov distance is a generalization of the Levy metric defined in one 
dimension. 

More formally, for two probability measures \i and v on the same Polish metric space equipped with the 
metric d, the p th Mallows distance is defined as 

D^v) = [MEd(X,Y)P] 1/p , 

where the infimum is taken over all joint distributions of the random variables X and Y with marginals (i 
and v respectively. 

The Levy-Prokhorov distance is defined as 



D L (fi, v) = inf {e > 0|/i(A) < v(A € ) + e and v(A) < ii(A e ) + e, V Borel sets A} , 

where A e is the e-neighborhood of A. 

Note that the Levy-Prokhorov metric characterizes the topology of weak convergence. Furthmo re, con- 
vergen ce with respect to any Mallows distance is slightly stronger than the weak convergence. See 



Villani 



(|2009l ) for a nice introduction to these topics. 

Our next definition is useful in proving the theoretical properties of the LCMLE. Let Q be the family of 
all probability distributions on R. Denote by Q* the subset of Q which contains all distributions with finite 
expectation and non-zero variance. For Q £ Q, define a profile log-likelihood type functional 

^ x) dx + 1 



L{Q) = sup ( [ 4>dQ 

0G$ {J 



If Q doesn't have finite expectation, L(Q) = — oo. If Q has zero variance, L(Q) = oo. 



The above function L(-) is just a special (one-dimensional) case of what has been studied in lDiimbgen. Samworth and Schul 



(|201lh . For the reader's convenience, we briefly recall some of their results which will turn to be useful in 



Section | 5.2I The following three lemmas ar e Theorem 2.2, Remarks 2.3-2.5 and Theorem 2.14-2.15 of 



Dumbgen. Samworth and Schuhmacherl (|201lh respectively. 



Lemma 5.1 (Existence). For all Q € Q* , there exists a unique function 

H'\Q) G argmax ( / cpdQ- [ e^dx + (5.1) 

Moreover, this function ip satisfies J e^^dx = 1 and 

int(csupp(Q)) C dom(-0) C csupp(Q), 
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where hit, dom, csupp are interior, domain and convex support operators respectively. Here the convex support 
is defined as the smallest closed interval [61,62] such that Q{\b\ 1 b2\) = 1. 

Lemma 5.2 (Properties). Let Q € Q*, then 

(i) First moment equality: J xe^^dx = J xQ(dx). 

(ii) Affine equivariance: for a, b G K with b =^ 0, let Q a ,b to be the distribution of a + bX when X has 

distribution Q, then L(Q at if) — L(Q) — log 
(Hi) Convexity: L(-) is convex on Q* . More precisely, for any Q\,Qi G Q* and < t < 1, L(tQ\ + (1 — 
t)Qi) < tL(Qi) + (1 — t)L(Q2). The two sides are equal if and only if ip(-\Qi) = ip{-\Qi)- 

Lemma 5.3 (Continuity). Let Q £ Q* and (Q n ) n be a sequence of distributions in Q* . 

(i) //lim^oo D L (Q n ,Q) = 0, then limsup^^ L(Q n ) < L(Q). 

(ii) //lim„_ s . 00 Di(Q n , Q) = 0, then lim„_ i . 00 L(Q n ) — L(Q). Moreover, the probability densities f = e^''^ 
and f n = e^^ Qn) satisfy linin^oo / \f n (x) - f(x)\dx = 0. 



5.2 Proofs 



Proof of Theorem 12. II 

First, we show that for any n > p + q + 1, the following event is null: 

fi = {36 e Q,m e K s.t. l t {6) = m, for t = l,...,n}. 



To do this, we need some well-known results from differential geometry. See iGuillemin and Pollackl (|l974T ) 
for background information. 

For any set of fixed initial values, consider a function H : ]R 2 (p+9+ 1 ) —> R p+q+1 defined as follows: 

H(8,m,X 1 ,...,X p+q+ x) = (ei(0) - m, . . . , e p+q+1 (9) - m) T . 

It is easy to check that H is a smooth (i.e. C°°) function. Furthermore, the Jacobian matrix of H has 
full-rank, because 



Rank 



OH 
~d6 



dHi 
dm 



OX,, 



OH 



P + q+l 



dH 



P+q + l 



i)H 



P + q+l 



dm dX\ ' ' ' OXp+g+i 







1 


1 


" 




dH 








= Rank 








86 












1 


* 


1 



= p 



Therefore, (0, . . . , 0) T G Rp+i +l is a regular value of H. 

Denote by C £ W+i +1 the set in which for every (X u . . . , X p+q+1 ) T e C, (0, ...,0) T € W+i +1 
is a critical value for h x, x v+ . + -> (Q, m) = H(6,m,Xi,...,X p+q+ i). The transversality-density theo- 
rem (jde la Fuentd . l2000l . page 216) shows that C has Lebesgue measure zero. Since under assumption 
(A.l), the distribution of (Xi, . . . , X p+q+ i) T has a probability density function, it is easy to check that 
Px 1 ,...,x p+q+1 (C) = 0. Furthermore, for every vector (Xi, . . . , X p+q+ i) T on the complement of C, the vector 
(0, . . . , 0) T G is regular for h Xlt ..., Xp+q+1 (6, m). 
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Now fix any (Xi, . . . , X p+q+ i) T £ C and assume Q, holds. By the preimage theorem (jGuillemin and Pollack , 



1974 , page 21), the preimage hj^ x p+q+1 ((0' • • • 1 0) T ) ^ s a submanifold with zero dimension, thus contains 
at most countably many isolated points; consequently, conditioning on {-^t}f=x + > can only take 

values at countably many points. It follows from assumption (A.l) that the event is null. 



Next, write 



T n (9) = supA„(</>,0), 
0e* 



where A„(-, •) is defined in (|2.1j) . On the complement of il, Lemma T5.3I entails the continuity of T n (-) over 
©. This, combined with the compactness of O, yields the existence of the LCMLE. □ 



Proof of Corollary | 

In view of Theorem 12.11 it is enough to show that T n (9) is coercive. One may refer to the proof of 
Corollary 12.41 for a similar (but slightly more involved) argument. □ 

Proof of Theorem 12.31 

For any 6 6 6, denote by {e t (<?)} the strictly stationary, ergodic and non-anticipative solution of 



[0) =X t -j2 atXt-i - b iet-i(0), Vt G Z. (5.2) 



Here by saying "non-anticipative" , we mean a process which value at each time t is a measurable function 
of the variables Xf- U , u = 0, 1, 2, 

Such solution exists because assumption (A. 4) implies that all the ARMA processes with parameter 
vector in O are invertible, thus their innovations have AR(oo) representations, i.e., {et(0)} = j^jjjX t , 
where B is the backshift operator. In particular, {et(6o)} = {et}- 

It is convenient to define the empirical distributions as follows: 

^ n 1 n 

Qn.e = - ^ # et (0) and Q nfi = - ^ <5 gf (e) . 

t=l 4=1 

Furthermore, let . . . , X_i, Xq, X\, . . . be an independent new realization of the existing ARMA(p, q) 
process (i.e. with Qq and 8q), and define {et(9)} analogously as shown in (|5.2p . Denote the distribution of 
hiO) by Qg. Note that Qg = Q - 

We will establish our results in the following order: 

(a) lim^oo sup ege -Di(Q„,e, Qn,e) = 0, a.s., where D\ is the 1 st Mallows distance. 

(b) liminfn^oo sup $x0 A n (</>, 8) > L(Q ), a.s. 

(c) lim„^ 00 sup eee DL((3„,e,(36i) =0, a.s. 

(d) 6 n -> O , a.s. 

(e) lim^oo J\f„(x) - fo(x)\ dx = 0, a.s. 

(a) Asymptotic irrelevance of the initial values. Rewrite (I5.2[) in matrix form 

e t (0)=y t (0)+M(0)e t _i(0), (5.3) 
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where 





" e t (0) ' 




Xt — Yl'j—i a>%Xt—i 




-h -h ■ ■ ■ -h q 




e t -x{9) 









1 ••• 


6t(0) = 


. Ut(P) = 




, M(9) = 















••• 1 



The spectral radius of a matrix M, denoted by p(M), is defined as the greatest modulus of its eigenvalues. 
It is easy to check that under assumptions (A. 2), (A. 3) and (A. 4) 



sup p{M(0)) < 1. 
see 



(5.4) 



By iterating (|5.3[) . we have 

e t (9) = y t (9) + M(9)y t _ l (9) 



M t - 1 (e)y 1 (6) + M t (e)e (e). 



Let y t {0) be the vector obtained by replacing X , ■ ■ ■ ,X\- P with any fixed initial guesses. Let e t {9) be the 
vector obtained by replacing ti{0) by for all i < t. We have 



t-p-i 



~e t {9) = y t (6) + M\6)y t _ l (6) + (9)y p (0) 

i=l 

It follows immediately from (I5.4j) that almost surely 



sup \e t {9) - e t (9)\ < sup ||e t (0) - e t (0)|| 2 
see see 



< sup 
eee 



min(p,t) 



M t - i {9){y i {9) y t {9)) + M\6){e {6) - eo(0)) 



i=l 



where -fC > and < p < 1 are two constants, and || • 1 1 2 is the Euclidean norm. Now elementary 
considerations show that almost surely 

1 , 1 K 
limsup sup Di(Q n g, Q n .e) < limsup — Kp l = limsup = 0. 

(b) The lower bound. It is well known in the empirical process theory that Di(Q n} g , Qo) -4' 0. This 
and point (a) entail Di(Q n g ,Q ) -4 0. By Lemma 15.31 almost surely 

liminf sup A n (<f>,0) > liminf sup A n (<p,9o) — liminf L(Q n> o ) = L(Qo). 

n->oo <j, x q n->oo n->oo 

(c) Uniform convergence in D^. We combine a Prohorov type approach with the standard compact- 
ness argument to establish this point. For all 9 £ and any positive integer k, denote by Vk(0) the open 
ball centered at 9 of radius 1/k. 
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We first show that for any fixed 0* € 0, almost surely 



lim lim sup D L (Q n ,e, Qe* ) = 0. 



k— ¥oo n— ¥oo 



eeVfc(e*)ne 



(5.5) 



To see this, we note that for any fixed u £ 



sup - Vl{e t (0) < u] < - V sup l{e t (0) < u} < - V 1 \ inf eAO) < u 

eev k (e*)ne n n ^sev t (e-)ne n ^ [eeT4(0*)ne 

Notice that the function 1 {inf^y; .rft* )no et(0) < is measurable because £t(0) is a continuous function 



Billingslevl (jl995l) and the pointwise ergodic theorem to deduce that 



Therefore we can use Theorem 36.4 of 
almost surely 

1 ™ f 
limsup sup -Vl{e t (0) < u} < ¥{ inf ei(0)<u 
n -yoo eev k (e*)ne n fr! {eev k (0')ne 

The monotone convergence theorem says that P {mfg e y k (Q*< jn £i(0) < u} decreases to P(ei(0*) < u) as 
— > oo. Applying a similar argument to the infimum to obtain that almost surely 



1 ™ 

(ei(0*) < u) < lim inf lim inf inf -Vl{e f (0)<w} 

fc-»oo n^oo eev k (e')ne n ^ 

1 " 

< lim sup lim sup sup — 1 {et(0) < u} < P(ei(0*) < u). 



(5.6) 
(5.7) 



The tightness of V>B^v k (e*)Qn.e then follows from (|5.6p and (|5.7j) for sufficiently large fc. 

Now suppose (I5.5|) does not hold. It is then possible to find a subsequence kj £ N with n(kj) < n(fc J+ i) 
and fc . € V kj (0*) for all j e N such that 

lim D L (Q n{k ) M Q g *) > 0. 

By the Prohorov's theorem, extracting a further subsequence if necessary, there exists a probability distri- 
bution Q* such that 

lim D L (Q n(k]) , 8k l Q*)=Q. 

Therefore Dl(Q* , Qe*) > 0. An application of the Portmanteau theorem shows that there at least exists an 
u £ R, such that 

Qn(fc 3 ),e s .((-oo,u]) > Qe*((-oo,u}). 

But this contradicts (|5.7|) (using the fact that for any fixed n, sup eeVfc( - e *j n Q ^X^tLi < u} is a 

decreasing function with respect to /c). Consequently, (|5.5[) holds true. 
Moreover, by a similar Prohorov type of argument, one can show that 



lim lim sup D L (Q g , Q g * ) = 0. 



(5. 
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Thus 



lim lim sup D L (Q n ,e, Qe) = 0, a.s. 



We conclude the proof of point (c) by a compactness argument. For any arbitrary 5 > 0, for every 9* £ O, 
we can find a neighborhood V(6*) satisfying 

limsup sup D L (Q nt e, Qe) < $, a.s. 
n->oo eev(G')ne 

Because © is compact, there exists a finite subcover of O of the form V(6i), . . . , V(0k)- Thus 
limsup sup D L (Q nt g, Q e ) < limsup max sup D L (Q n ^,Qg) < S, a.s. 

This completes the proof of point (c). 

(d) Convergence of 9 n . To verify the assertion it suffices to consider a sequence of fixed observations 
Xi, X2, ■ ■ . such that points (a) - (c) hold true. Our proof relies on the following simple result from analysis: 
assume that {m„} is a bounded sequence with the property that every convergent subsequence of {m n } 
converges to the same limit to, then {m n } must converge to to. Now consider any convergent subsequence 
of n that converges to any arbitrary 6* , which we denote by 9 n (j) ~> 9*- Because 8 is compact, 0* E 0. 
Our goal is to show that 9* = 9q. Point (c), together with (|5.8p . entails that 



.Urn D L {Q n{j)iKu) ,Q e *) 



= 0. 



Since the convergence in the Mallows metric D\ is stronger than the weak convergence, combining this with 



point (a) leads to Q n ^ e . — » Qo*- Moreover, because li(fln) and h{9*) — li(9 ) are independent, by 



Lemma 15731 and Theorem 3.5 of 



Diimbgen. Samworth and Schuhmacherl (|2011l ). 



limsup£(Q n( , ) 6 )< L(Qo*) < L(Q ). 

In light of point (b) , this implies that there must exist a constant ra£K such that with probability one 

e 1 (e*)-e 1 (e )=m. (5.9) 

Let B be the backshift operator. Under assumption (A. 4), Bg(B) is invertible for all 9 € O, so (|5.9[) is 
equivalent to 

A e .{B) A 6o (B) 



Be' (B) Be (B)) Xl=m > W - p - h 

If the operator in B on the left hand side was not null, then there would exist a constant linear combination 

of Xi, Xq, X-i, This is impossible since the innovations are nondegenerate by assumption (A.l) (or 

(A.l*)). We thus have 

A e< z ) _ A <>o( z ) vh < 1 
B e .{z)- B eo {zY 

It follows under assumption (A. 5) that A s * = A So and B e - = Bg , so 9* = 9 . Finally, since O is compact 
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and the convergent subsequence is picked arbitrarily, we obtain 6 n — >• 8q. 

(e) Convergence of /„. Recall that the weak convergence of Q n # to Qq is established in the proof 
of point (d). Denote by fi' k (Q) the fe-th moment of the distribution Q. We now show the convergence in the 
hrst moment, i.e. fi'\{Q n g ) "4 fi'xiQo)- Using the notations from the proof of point (c) and applying the 
ergodic theorem to both the infimum and the supremum, we have that almost surely 



1 ™ 

lim inf inf -Ve t (0)>E inf U8), 
n->oo eev k (9 (> )ne n ^—^ eev k (e )ne 

1 " 

limsup sup — Yje t (0)<E sup h{6)- 

n-j-oo eev k (e )ne n t=1 eev k (e )ne 

The continuity of li(6) and the monotone convergence theorem entail that 

1 - 1 - 

lim lim inf — \e t (0) = lim lim sup —y^€ t (6)= 1 Eli{6 ), a.s 



fc->oo n-y co 0£V k (6 n )nO 71 



oev k (9 )ne n t=1 



This, together with point (d), entails (J*'i(Q n # ) ^4' ju'i(Qo)- Now we can use Theorem 6.9 of lVillanil (|2009l ) 
to show almost sure convergence in the 1 st Mallows metric of Q n g to Qq. Moreover, it follows from point 
(a) that D 1 (Q n e , Q„) 4' 0. Point (e) can now be established via Lemma 15.31 □ 



Proof of Corollary | 

In view of the proof of Theorem 12.31 all that remains is to show the almost sure boundedness of ||0„||2- 
Let fix — EX = f X J°(^ X m Using the fact that e t (0 n ) — £t + 2Zf = i ( a o-i — a n i)X t -i and with some careful 



calculations, we have 



r i - 

/ l*-^i(Qn,d B )IO»,d B (*)>-E 

J I — 1 



^(a z - a ni )(Xt-i - fix) 



1 71 ^ 1 n 1 n 
- Y] y^iaoi - a ni )(X t -i ~fi x ) - - V |e t | V e t 



t=l i=l 



t=l 



t=l 



It follows from Lemma 3.1 of 
point (b) in the previous proof that 



Dumbgen. Samworth and Schuhmacherl ([20111 ) , the law of large numbers and 



1 ™ 



E( aoi _ &ni)[Xt-i - fix) 



i=l 



n V 



EE( fl0i _ &ni)(Xt-i ~ fix) 



1=1 i=l 



<Ci 



(5.10) 



almost surely, for sufficiently large n £ N, provided that C\ > 2 J \t\fo(dt) + e L (Q°> . 
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Let's consider the set {9 6 l p : ||0 — 0q||2 = !}■ By the uniform ergodic theorem, almost surely 



lim sup 



* 6>: ||6>— 0o II 2 = 1 



1 ™ 

•n £■ — ' 



^(a 0l - di)(Xt-i - fix) 



E 



lim sup 
n ^°° 0:\\e-e o h=i 



^(a 0i - a t )(Xp + i-, - fix 
i=i 

^ n V 

- ^2 ^2(aoi - a i )(X t - l - fix) 



t=l i=l 



0, (5.11) 
0. (5.12) 



Observe that E| ^Zf =1 (ctoi — ~ > 0, because otherwise {Xi — fix, ■ ■ ■ , X p — fix} would be 

linearly dependent, which would violate assumption (A.l). By the compactness of {9 E W p : \\9 — || 2 = 1}: 



min E 

0:||0-6> o ||2 = l 



^(a i — ai)(X p+ i-i — fix) 



i=i 



C 2 > 0. 



Because of the scaling property, 



min E 

0:||0-0 o || 2 =u 



J^( a 0i — di)(X p +i-i — 



fix, 



= uC 2 . 



(5.13) 



Putting (|5.10p . (|5.1ip . (|5. 12|) and (|5.13l) together entails that almost surely ||0„ - O ||2 < Ci/C 2 , which also 
implies that ||0n||2 is bounded. □ 

Proof of Theorem 13.11 

Following the scheme of the proof of Theorem 12.11 it suffices to show that forn>p + g + r + ,s + l the 
following event is null: 

Q = {39 e @',m e R s.t. fjt(0) = ma t (0), for t = 1, . . . ,n} . 

Now let's construct the function H : 9' x M -» RP+s+^+s+i a s 

11(0,171, Xi, . . . , X p+q+r+s+ i) = (fji(9) - mai{9), . . . , f) p+q+r+s+1 (9) - md p+q+r+s+1 (9)) T . 

Note that H is actually a R2(p+g+r+ s +i) RP +g +r+s+ i mapping5 because the (p + g + l) th component of 
0' is always one. 

The rest of the proof is similar to that of Theorem 12.11 so is omitted. □ 



The next lemma is a version of Slutsky's theorem with respect to the 1 st Mallows distance. 

Lemma 5.4. Let Xq, Xi, X 2 , ■ ■ ■ be univariate random variables with corresponding distributions Po, Pi, P 2 , 

Suppose E|Xo| < oo and D\(P n , Po) — > 0. 

(i) Let mi, m 2 , ■ ■ ■ be a real sequence with finite limit lim n _>. 00 m n = too. Denote by Qo, Qx, ■ ■ ■ the corre- 
sponding distributions of uioXq, m\X\, . . ., then Di(Q n ,Qo) — > 0. 
(ii) Let Y be a univariate random variable independent of {Xi}^2. with M\Y\ < oo. Denote by Qo,Qi, ■ ■ ■ 
the corresponding distributions of XqY, X{Y, . . ., then D\(Q n , Qq) — » 0. 
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Proof of Lemma I5TH 

We only show (i) here. One can use a similar argument to prove (ii). 
Recall the definition of the f st Mallows distance, 

Di{Q n ,Qo)= inf E\m n X n - m X \, 

(X„,X ) 

where the infimum is taken over all pairs (X n ,Xo) of random variables X n ~ P n , Xq <~ Pq on a common 
probability space. Since D\ convergence implies E|X„| — > E|JTo| < oo, we have 

inf E\m n X n - m X \ < inf {E|to„X„ - m X n \ + E\m X n - m X \} 

(X n . A'o) (X.„ ,Xo) 

< \m n - m \E\X n \ + m inf E|X„-X o |^0, 

(X n ,Xq) 

as desired. □ 

The next lemma enhances our understanding of the behavior of the functional ip(-\Q) given as (15. It 
play a critical role in the proof of Theorem 15.61 

Lemma 5.5. Let X u ,Xi,Y be univariate random variables. Let R u , Ri and Q be the corresponding distri- 
butions of X U Y , X{Y and Y. Assume that 

(i) X u and Y are independent, with W\X U \ < oo; 
(ii) Xi andY are independent; 
(m) Q G Q*; 

(iv) There exists m > such that P(X U > m) = 1 and P(m > Xi > 0) = 1. 
Then^{-\Ru)^tp(-\Ri). 



Proof of Lemma 15751 

First we show that both tp(-\R u ) and ip(-\Ri) uniquely exist. In view of Lemma 15.11 it is enough to check 
that Ru e Q* and Ri e Q*. This can be easily done using Q G Q*, E|X„| < oo and E\Xi\ < oo. 

Now suppose ip(-\R u ) = ip(-\Ri) — We claim that the expectation of Y is zero. This is due to the 

first moment equality in Lemma 15.21 Moreover, the convex support of Q must be M. Otherwise, by the 
second part of Lemma 15.11 the domains of ip(-\R u ) and ip(-\Ri) would be different, which would contradict 
i>(-\R u ) = ii>(-\Ri). 

Because ip(-) is convex and defines a density, there exists v € (—00,00) such that 

ij)(v) > - {^{v -8)+ ip(v + 6)} for all S > 0. 

Without loss of generality, we may assume v < 0, since otherwise by symmetry one may just take the additive 
inverse of Y. 



Let G be the cumulative distribution function with log-density ip. Then by Theorem 2.7 of lDumbgen. Samworth and Schuh 
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(120111) 



{P(X U Y <t)- G(t)} dt = 0, 
/ {P(X[Y <t)- G(t)} dt = 0. 

J — OO 

It follows that 

/V 
{P{X U Y < t) - P{XiY < t)} dt = 0. (5.14) 
-oo 

Note that for every t 6 (— oo, v] C (— oo, 0], we have 

P(XiY <t)< P(Y < t/m) < P{X U Y < t). (5.15) 

Because cumulative distribution functions are right continuous with left limits (cadlag), (|5.14[) and (|5.15p 
imply that 

P(X U Y <t) = P(y < t/m) = P(XiY < t), for every t £ (-oo, v). 
As P(X U > m) = 1, we can find some S > such that P(X U > m + S) > 0. Now 

P(Y < t/m) = P{X U Y <t)> P(X U > m + 6)P [Y < — — ) + P(m + 6 > X u > m)P(Y < t/m). 



From above, we obtain P(Y < t/m) > P (y < ^pjj, which implies P(Y < t/m) = P (Y < for all 

t G (— oo,u) C (—oo,0). Consequently, if we take any fixed t € (— oo,u), then 

Py<t/m =F p - — — <y<— — — ^=0 

i=l 



m \ m J m \ m 

On the other hand, because the convex support of Q is K, we must have P(Y < t/m) > 0. The proof is 
complete by Reductio ad absurdum. □ 

The following theorem can be viewed as a version of Jensen's inequality on Q* . It serves as the key 
element in proving Theorem 13.21 and may be of some independent interest as well. 

Theorem 5.6. Let X,Y be univariate random variables with corresponding distributions P,Q and Q € Q* . 
Suppose further that X and Y are independent, with P(X > 0) = 1 and ElogX = m < oo. Denote the 
distribution of XY by R. Then 

L{R)<L{Q)-m. (5.16) 
The equality holds if and only if X = e m with probability one. 



Proof of Theorem 15.61 

The inequality is trivial in the following cases: 
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(i) EX — oo : Because Q £ Q* , E\Y\ > and L(Q) is finite. Note that E\XY\ = E|X|E|Y| = oo, so 
L(R) = — oo. In this case, the inequality is strict. 

(ii) var(X) = : P is a point mass, so L(R) = L(Q) — to by the affine equivariance of L(-). 

(hi) ElogX = — oo : For the equality to hold, one needs L(R) = oo, thus R is a point mass. It then follows 
that P(X = 0) = 1. 

For the remaining of the proof, we assume P £ Q* and to > — oo. This implies that R £ Q* . 

Denote by F and G the cumulative distribution functions corresponding to P and Q. Let X n be a random 
variable with the corresponding distribution P n defined as 



1 - 



n 

i=i 



where F^ 1 is the generalized inverse function of F, i.e. F~ 1 (p) = ini{x £ M : p < F(x)}. 

Let Rn be the distribution corresponding to X n Y . Abusing notation slightly in the following, given t £ K, 
we denote Qt to be the distribution corresponding to the random variable tY. Then R n = ^ X)"=i Qf~ 1 ( 1 )• 
Because L(-) is convex and affine equivariant (Lemma I5.2j) . 

1 " 1 " / ' \ 

L(R n ) <-J2 Mr-H^)) = HQ) - ~ E UTi ) " (5A7) 



Since Di(P n ,P) — > 0, Lemma l5.4f ii) shows that Di(R n ,R) — > 0. It follows from Lemma I5.3I that 
in-i-oo L(R n ) — L(R). Furthermore, 



lim - VlogF- 1 f^-r) = / \ogF- 1 {p)dp = m. 

1=1 \ / u 



We now let n — > oo on both sides of (|5.17p to establish the inequality (|5.16[) . 

Next, we show L(R) = L(Q) — m implies that X must be almost surely constant. Fix v = F _1 (l/2). 
It follows from m > — oo that v > and P(X > 0) = 1. Suppose X is not almost surely constant, then 
¥(X > v) = p £ [1/2,1). Denote by R u and Ri the corresponding distributions of (XY\X > v) and 
(XY\X < v). Clearly, R = pR u + (1 - p)R l . From Lemma IS31 ip(-\R u ) ^ 4>(-\Ri)- Now by the convexity of 
L(-) again, we have 

L(R)<pL(R u ) + (l-p)L(R l ). 

Using (|5.16l) proved above, 

P L(R U ) + (1 - p)L{Ri) < P L(Q) - E(log X1{X > »}) + (1 - p)L(Q) - E(log < «}) 

= L(Q) - ElogX = L(Q) - rn. 

Consequently, < L(Q) — m, as required. □ 



Corollary 5.7. Le£ JCi,X2,Y &e univariate random variables with corresponding distributions P\,P2 and 
Q- Q £ Q* . Suppose that X± and Y are independent, Xi and Y are independent, with V(X2 > 0) = 1 and 
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ElogX2 = m € (—00,00). Denote the distribution of (X\ + Y)X 2 by R. Then 



L(R) < L(Q) - to. 

The equality holds if and only if Pi = 5 U for some net and P2 — S e m . 

The proof of the above corollary is omitted owing to its similarity to that of Theorem 15.61 
Before proceeding, we note that there are some similarities between the proofs of Theorem 12.31 and 
Theorem l3.21 mainly due to the ARMA presentation of GARCH. So certain details in the proof of Thcorcm l3.2l 
are omitted. Still, because of the emergence of the logarithmic term in (|3.2|) , our proof below is more involved. 
Proof of Theorem 13.21 

Under assumptions (A. 4) and (B.4), {X t } is stationary and ergodic. Let and {of (#)} be respectively 

the stationary, ergodic and non-anticipative solutions of 

v 1 

rjt(0) =X t -^2 caXt-i - b iVt-i{0), V* e Z, (5.18) 
i=l i=l 
r s 

aUe)=c + J2^Vt- l (0)+J2(3 l al t (e), Vt e Z. (5.19) 

Note that assumptions (A. 4) and (B.2)-(B.4) ensure the existence of such solutions. 
Define the empirical distributions as 

n 1 n 

{=1 t=l 



Let . . . , A_i, Ao, Ai, . . . be an independent new realization of the existing ARMA(p, g)-GARCH(r, s), 

Mg) 



and define {%(#)} and {of (0)} analogously as shown in (|5.18|) and (|5.19|) . Denote the distribution of fn ^ 



by Qe- 

We will split our proof into several parts: 

(a) lim^oo sup eee / D2(Q n ,0,Qn,e) = 0, a.s., where D 2 is the 2 nd Mallows distance. 

(b) lim^oo sup eee , ^ |£™ =1 logcf (0) -Er=i lo S cr t 2 ( 6 ')| =0, a.s. 

(c) For any £ 6', Elogof (0) < 00. 

(d) liminf„^ oo sup <I , x0 , A n (0,0) > L(Q ) - |Elogof(0 o ), a.s. 

(e) lim^oo sup 0ee , |± £™ =1 ^S^K ) - Elogof (0)| = 0, a.s. 

(f) lim„^oo sup eee , D L (Q n . e ,Qe) = 0, a.s. 

(g) 0„ — > O , a.s., where we write for convenience 

a' ( h u 1 ao1 a 0r „ o 
w — a 01, ■ • • , a Qp, Oqi, . . . , Ooqj J-) ) • • • ) ! P01) • ■ • ) POi 

(h) c„ -> c , a.s. 

(i) limj^oo /|/„(a;) - /,J (a?) | rfac = 0, a.s. 

(a) Asymptotic irrelevance of the initial values - I. In view of the matrix representations of ARMA 



T 
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and GARCH, assumptions (A. 4) and (B.2) - (B.4) imply that almost surely 



sup \rH(fl)-TH(8)\<Kff, Vi e N, 

9G0' 



(5.20) 
(5.21) 



where K > and < p < 1 are two generic constants. See also point (a) in the proof of Theorem 12.31 for 
reference. It then follows that 



1 - 

limsup sup D\{Q n fi, Q n ,e) < limsup sup - } 
n->oo eee' n-^-ao eee' n T^i 

1 " 

= lim sup sup — > 



Vt(8) foO) 



eee' n 



t=i 



°t{9) a t {0) 



o t (0) a t {0) a t (0) d t {6) 



< lim sup sup — 'S~^ 



2 " U(e)\o?(6) -a 2 {6)\ , ( m{ e)-fj t (0)T 



t=l 



+ 



< limsup sup - VtW y t (8) - d 2 t {6)\ 
n^^. eee' ™ — 



t=i 



+ limsup sup - y^(r]t(0) - Vt(0)) 2 - 
n-yoo eee' n 

Here we used the fact that € 6', so both cr 2 (0) and of (0) are greater than or eq ual to one. For the 
first t erm, we can apply f|5.21[) and a similar argument in the proof of Theorem 3.1 of iFrancq and Zako'ian 
([20041 ) to prove that it approaches zero almost surely. For the second term, (|5.20j) entails its almost sure 
convergence to zero. 

(b) Asymptotic irrelevance of the initial values - II. Utilizing the inequality | log x— log y\ < \ x ~ y][ 
for x, y > and (|5.21[) . one has that almost surely 



i(x,2/) 



limsup sup — 

n->oo eee' 2ri 



5>ga t 2 (0)-5>ga t 2 (0) 



t=i 



1 - 

< limsup sup — V \a 2 (8) - of (0)1 
n->oo eee' ^ n — - 



< lim sup sup ^-Y.P f £ (h(8)\ + l) 



t-i 



n->co eee' 2n 



t=l i=l-r 



The rest of the proof is similar to that of point (a). 

(c) Existence of the logarithmic expectation over 0'. From Proposition 1 of lFrancq and Zako'ian 

([20041 ) . there exists an u 6 (0, 1/2) with Erf 11 (0' ) < oo. Jensen's inequality and the subadditivity of the 
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function f(z) = z u , z € (0, oo) entail that for any 6 £ 0', 



E|iog*f(s)|=Eiogai(») 



< 

s 



-log (B e -«(l)+E^(6of>i(0)P 



where {'Ji(0)}^ 1 are given as 



i! ck l I B e (z) 



, for i = 1,2, 

2=0 



Now because all the roots of = have modulus greater than one and O' is compact, we can find two 

constants K > and < p < 1 such that sup eg0 , |7»(0)| < Kp 1 for every i £ N. It therefore follows that 
Ei=i \li( e )\ u < l~^r < oo. Similarly, it can be shown that sup eee , £(^"(0)) < oo. Therefore, E| log of (0)| 
is bounded over 0'. 

(d) The lower bound. It is easy to check that Q n ,e' — ^ Y^it=i ^yeo 6 *' -Denote by Qo' the distribution 
corresponding to y^coej. Then Di(Q n ^ o , Qo>) —t 0. From point (a), we deduce Di(Q n ,e' i Qo' 

) 4' 0. Now 

use point (b), (c) and the pointwise ergodic theorem to see 

1 " 1 
^ E lo S^ 2 ( o) = 2 Eloga?^), a.s. 

™ °° t=i 

It then follows from the continuity and the affine equi variance of L(-) that 

1 - 

liminf sup A n (^>, 0) > liminf sup A n ((f>, 9' ) = liminf L(Q n g< ) — limsup — } log of (d' ) 

n-J-oo $xe / n->oo n->oo ' JWOO 2n ^ 

= L(QoO - iElogo?(0 o ) = L(Q ) - iElogoi(0o). 

(e) Uniform ergodic theorem. Its proof follows from that of the uniform law of large numbers, where 
one combines a standard bracketing idea with the compactness argument. We omit the proof for brevity. 

(f) Uniform weak convergence. The proof is similar to that given for Theorem 12.31 One may refer 
to point (c) in the proof of Theorem 12.31 for more details. 

(g) Convergence of 6 n . To verify the assertion, it suffices to consider a sequence of fixed observations 
Xi, X2, ■ ■ ■ such that (a) - (f) hold true. Consider any convergent subsequence of 8 n , denoting which by 
n u) — > 0* . We are about to show that 6* = 0' Q . By compactness, 0* € 8'. A slight variant of point (f) 
together with point (a) entails 

lim D L (Q . v ,Q e .)=0. 



For all 6 G O', 



(Ri + R2)R3, 
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where R\ is independent of both i? 2 and R3. So by Lemma l5.2f affine equivariance of £(•)), Lemma 15.31 and 
Corollary K7\ 



\imsu P L(Q nU)K J < L{Qo») < L(Q ) - ilogc - E logoff) + Elog<7< ! (0*). (5.22) 



Furthermore, it is easy to check from points (b) and (e) that 

1 " / 
hm - log ~a\ {e n{3) ) - E log a? (0* ) . 

t=i 

Combining those two elements together gives that 

1 n ~/ 

limsup sup A n(i) (cy3,0) < limsupX(Q n(j) ^ ) - liminf — ^ log of (0 n(j) ) 



t=i 

< L(Q ) - i log c - ^Eloga%(9' ) = L(Q ) - Elogof(0„). 

In light of point (d), the equality is enforced in (|5.22l) . So by Corollary 15 . 71 again, there must exist constants 
Ci and C2 6 (0, 00) such that 

ftyi-W -a).., (5 , 3 , 

o"i(^o) / 

|P1= CS ) =1 . ,5,4, 

Note that for every 9g 8', one can express 771 (0) as a linear combination of > 0. Furthermore, 

one can write <rf(0) — l/£>6i(l) as a linear combination of Xx_iX\-j,i,j > 1. We claim that C\ = and 
771(0*) = I7i(0 o ) w hh probability one, because otherwise (|5.23[) would imply the existence of a constant linear 
combination of Xi-iX\-j with i,j > 0, which would violate assumption (B.l) (or even (B.l*)). By the 
same argument given in the proof of Theorem l2.3[ we get Ag* = Ag> o and Bg* = Bg> Q . 

Moreover, it follows from (|5.23[) and (15.241) that with probability one 

C 2 Ag*(B) Ag' Q (B) 1 1 c 2 



®o* (B) B e ,(B)) B fl ,(l) B fl .(l) 

It can be seen that this equality holds if and only if 

^*W-^^,VW<1 and 1 - 



B .(z) Bfl,(2)' 1 B e[> (l) B e .(l)' 

Under assumption (B.5), it implies Be* = Bg> o , which consequently entails C 2 = 1 and Ag* = Ag> o . 

Therefore, 9* = 0' . Finally, since 0' is compact and the convergent subsequence is picked arbitrarily, 
-1 . 

9 n —>9 , as desired. 

(h) Convergence of c n . In view of point (a), it suffices to show H2(Q n g' ) c . One can follow a 
similar argument used for point (e) in the proof of Theorem 12.31 to establish this point. Moreover, by the 
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continuous mapping theorem, 8 n "4' 9q. 

(i) Convergence of /„. A close scrutiny reveals that we have already established firstly the convergence 
of Q n P ' to Qo' i n law in the proof of point (g), and secondly, fJ,' 2 (Q n e ' ) -4' fJ-^iQo') m point (h). Theorem 6.9 
of IVillanil ( 2009 ) establishes the convergence of Q y to Qo' in the 2 nd Mallows distance, which implies the 



convergence in the 1 st Mallows distance. Again by point (a), D\{Q n y ,Qo') °4 0. Now one can use 
Lemma | 5.4fi) and L emma 15.31 to obtain f\f n (x) — Jq{x)\ dx -4' 0. Finally, one can apply Proposition 2 of 



Cule and Samworthl (|2010r ) and the dominated convergence theorem to see (|3.4p . 



□ 
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